Improving Text Segmentation by Combining Endogenous and Exogenous Methods

نویسنده

  • Olivier Ferret
چکیده

Topic segmentation was addressed by a large amount of work from which it is not easy to draw conclusions, especially about the need for knowledge. In this article, we propose to combine in the same framework two methods for improving the results of a topic segmenter based on lexical reiteration. The first one is endogenous and exploits the distributional similarity of words in a document for discovering its topics. These topics are then used to facilitate the detection of topical similarity between discourse units. The second approach achieves the same goal by relying on external resources. Two resources are tested: a network of lexical co-occurrences built from a large corpus and a set of word senses induced from this network. An evaluation of the two approaches and their combination is performed in a reference framework and shows the interest of this combination both for French and English.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Advanced exergy and exergoeconomic analysis of the integrated structure of simultaneous production of NGL recovery and liquefaction

Advanced exergoeconomic analysis is applied on process configurations for co-production of LNG and NGL. This analysis provides information on the origin of the irreversibility as well as the irreversibility value of the process. The exergy destruction, exergy destruction cost, and also investment cost, are divided into two available/unavailable and endogenous/exogenous parts. The results of adv...

متن کامل

Spectral-spatial classification of hyperspectral images by combining hierarchical and marker-based Minimum Spanning Forest algorithms

Many researches have demonstrated that the spatial information can play an important role in the classification of hyperspectral imagery. This study proposes a modified spectral–spatial classification approach for improving the spectral–spatial classification of hyperspectral images. In the proposed method ten spatial/texture features, using mean, standard deviation, contrast, homogeneity, corr...

متن کامل

A Modified Character Segmentation Algorithm for Farsi Printed Text Using Upper Contour Labelling

In this paper, a modified segmentation algorithm for printed Farsi words is presented. This algorithm is based on a previous work by Azmi that uses the conditional labeling of the upper contour to find the segmentation points. The main objective is to improve the segmentation results for low quality prints. To achieve this, various modifications on local baseline detection, contour labeling an...

متن کامل

Improving Brain Magnetic Resonance Image (MRI) Segmentation via a Novel Algorithm based on Genetic and Regional Growth

Background: Regarding the importance of right diagnosis in medical applications, various methods have been exploited for processing medical images solar. The method of segmentation is used to analyze anal to miscall structures in medical imaging.Objective: This study describes a new method for brain Magnetic Resonance Image (MRI) segmentation via a novel algorithm based on genetic and regiona...

متن کامل

Combining data mining and group decision making in retailer segmentation based on LRFMP variables

Data mining is a powerful tool for firms to extract knowledge from their customers’ transaction data. One of the useful applications of data mining is segmentation. Segmentation is an effective tool for managers to make right marketing strategies for right customer segments. In this study we have segmented retailers of a hygienic manufacture. Nowadays all manufactures do understand that for st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009